Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG

نویسندگان

  • Marko Brunzel
  • Myra Spiliopoulou
چکیده

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning hierarchical hypernym-hyponym relations. The extraction of co-hyponym and co-meronym sibling semantics is performed to a much lesser extent, though they are not less important in ontology engineering. In this paper we will describe and evaluate the XTREEM-SG (Xhtml TREE Mining for Sibling Groups) approach on finding sibling semantics from semistructured Web documents. XTREEM takes advantage of the added value of mark-up, available in web content, for grouping text siblings. We will show that this grouping is semantically meaningful. The XTREEM-SG approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software or training. In this paper we apply the XTREEM-SG approach and evaluate against the reference semantics from two golden standard ontologies. We investigate how variations on input, parameters and reference influence the obtained results on structuring a closed vocabulary on sibling relations. Earlier methods that evaluate sibling relations against a golden standard report a 14.18% F-measure value. Our method improves this number into 21.47%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Semantic Sibling Associations from Web Documents with XTREEM-SP

The semi-automatic extraction of semantics for ontology enhancement or semantic-based information retrieval encompasses several open challenges. There are many findings on the identification of vertical relations among concepts, but much less on indirect, horizontal relations among concepts that share a common, a priori unknown parent, such as Co-Hyponyms and CoMeronyms. We propose the method X...

متن کامل

Acquiring Semantic Sibling Associations from Web Documents

The automated discovery of relationships among terms contributes to the automation of the ontology engineering process and allows for sophisticated query expansion in information retrieval. While there are many findings on the identification of direct hierarchical relations among concepts, less attention has been paid on the discovery sibling terms. These are terms that share a common, a priori...

متن کامل

Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clusterin...

متن کامل

Ontology Learning and Population: Bridging the Gap between Text and Knowledge

Ontology Learning is up to now dominated by techniques which use text as input. There are only few methods which use a different data source. The techniques which use highly structured data as input have the disadvantage that such data sources are rare. On the other side, there are enormous amounts of Web content present today. We present the XTREEM (Xhtml TREE Mining) methods which enable Onto...

متن کامل

Automatically Discovering Semantic Links among Documents and Applications

ABSTRACT Automatically discovering semantic links among documents is the basis of developing advanced applications on large-scale documentary resources. This paper proposes an approach to automatically discover semantic links in a given document set. It has the following advantages: (1) It does not rely on any predefined ontology. (2) The semantic link networks and relevant rules automatically ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006